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I  Summary 


This  project  focuses  on  large  scale  dynamic  data  driven  applications  systems  (DDDAS,  or  In- 
foSymbiotic  systems )  governed  by  partial  differential  equations  (PDEs),  e.g.,  arising  in  atmospheric 
environments.  Specifically,  our  main  interests  are  data  assimilation  and  the  configuration  of  sensor 
networks.  Data  assimilation  is  the  process  to  dynamically  integrate  information  from  measure¬ 
ments  into  models.  In  a  variational  approach  the  data  assimilation  problem  is  posed  as  an  inverse 
problem,  where  parameters  are  adjusted  such  that  the  model  predictions  best  fit  the  measurements. 
Sensor  network  configuration  is  the  process  of  using  model  results  to  dynamically  steer  the  mea¬ 
surement  process. 

InfoSymbiotic  applications  are  inherently  subject  to  uncertainties  associated  with  imperfect 
models  and  with  noisy  data.  There  is  an  urgent  need  to  quantify,  and  control,  the  effect  of  model 
and  data  errors  on  the  overall  DDDAS  results,  and  to  fill  the  gap  between  the  state-of-the-art  mod¬ 
eling  techniques  (capable  of  quantifying  uncertainty  in  modeling  results)  and  the  computational 
tools  currently  available  for  InfoSymbiotic  applications. 

During  the  first  three  years  of  this  project  (2012-2015)  we  started  to  address  this  need  and 
developed  a  rigorous  framework  for  quantifying  and  reducing  uncertainty  in  the  context  of  InfoS¬ 
ymbiotic  systems  [IHElSi  6l  iTO,  U]  (details  are  given  in  Section  [II]). 

DDDAS  integrates  computational  simulations  and  physical  measurements  in  a  symbiotic  feed¬ 
back  control  system.  Inverse  problems  in  this  framework  use  data  from  measurements  along  with  a 
numerical  model  to  estimate  the  parameters  or  state  of  a  physical  system  of  interest.  Uncertainties 
in  both  measurements  and  the  computational  model  lead  to  inaccurate  estimates.  We  developed 
a  goal-oriented  aposteriori  error  estimation  methodology  for  the  impact  of  different  errors  on  the 
variational  solutions  of  inverse  problems.  In  the  goal-oriented  approach  we  are  interested  in  esti¬ 
mating  the  impact  of  observation  and  model  errors  on  the  quantity  of  interest,  i.e.,  on  the  aspect  of 
interest  of  the  optimal  solution. 

The  variational  data  assimilation  problem  optimizes  the  model  states  and  parameters  in  order 
to  obtain  predictions  that  fit  best  the  measurements.  In  this  project  we  solved  the  complementary 
problem  of  optimizing  the  DDDAS  process.  The  strategies  used  to  collect  and  process  the  data  are 
considered  to  be  parameters  of  the  inference  system,  and  are  themselves  improved  via  an  additional 
optimization  process.  Specifically,  we  seek  to  improve  the  parameters  of  the  data  assimilation 
system.  We  have  formulated  the  optimal  configuration  of  the  DDDAS  system  as  the  following 
“optimization-constrained  optimization  problem??.  Our  algorithm  is  based  on  first  and  second 
order  adjoints,  and  on  the  solution  of  large  linear  systems.  We  also  proposed  efficient  methods 
for  computing  observation  impact,  including  low-rank  approximations  of  observation-to-analysis 
sensitivities. 

DDDAS  variational  inference  in  real  time  is  hindered  by  costly  forward  and  adjoint  model 
runs.  We  proposed  a  new  parallel-in-time  algorithm  to  speed  up  the  solution  process.  The  original 
4D-Var  problem  is  solved  in  the  augmented  Lagrangian  framework.  To  expose  time  parallelism 
the  assimilation  window  is  divided  into  several  sub-intervals.  This  formulation  allows  to  integrate 
the  forward  and  the  adjoint  models  over  different  subintervals  in  parallel. 
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The  construction  and  validation  of  an  adjoint  model  is  an  extremely  labor-intensive  process. 

To  address  this  challenge  we  proposed  a  derivative-free  4D-Var(-like)  smoothing  algorithm  that 
approximately  solves  DDDAS  variational  inference  without  the  need  to  construct  adjoint  models. 
Specifically,  our  TR-4D-EnKF  algorithm  uses  a  trust-region  approach  to  optimize  in  ensemble 
space. 

Variational  DDDAS  inference  solution  does  not  include  posterior  uncertainty  estimates.  To 
address  this  challenge  we  developed  new  nonlinear  filtering  and  smoothing  algorithms  that  sample 
directly  from  the  posterior  PDF  using  a  Hybrid  Markov-Chain  Monte  Carlo  (HMCMC)  approach. 
The  sampling  smoother  is  implemented  efficiently  using  the  same  adjoint  computational  infras¬ 
tructure  used  in  4D-Var. 

Consistency  of  the  reduced-order  Karush-Kuhn-Tucker  conditions  with  the  full-order  optimal¬ 
ity  conditions  is  a  key  ingredient  for  successful  reduced  order  data  assimilation  problems.  This 
translates  into  accurate  low-rank  approximations  of  the  both  adjoint  and  forward  models  leading 
to  reduced  bases  constructed  from  the  dominant  eigenvectors  of  the  correlation  matrix  of  the  ag¬ 
gregated  snapshots  of  full  forward  and  the  adjoint  models.  Our  work  underlines  the  importance  of 
incorporating  the  adjoint  information  into  the  construction  of  reduced  order  basis  for  performing 
reduced  order  4D-Var  data  assimilation. 

II  Results  From  the  DDDAS  Project  AFOSR  FA9550-12-1-0293-DEF  (2012- 
2015) 

We  summarize  here  the  main  results  obtained  during  the  first  three  years  of  this  project,  2012- 
2015.  The  research  presented  below  was  either  fully  or  partially  funded  by  this  project. 

II.  1  Mathematical  framework 

We  work  in  a  variational  framework  and  regard  the  inference  problem  as  an  inverse  problem, 
as  follows. 

•  The  real  (physical)  system  is  described  by  a  state  xtme  (e.g.,  the  spatio-temporal  distribution 
of  wind  velocities)  and  a  vector  of  model  parameters  of  interest  0Uue  (e.g.,  the  fields  at  the 
initial  time).  We  do  not  know  the  real  state  or  the  real  parameters,  and  our  goal  is  to  derive 
information  about  (9,rue  from  measurements  of  xtrue. 

•  The  prior  information  encapsulates  our  current  knowledge  of  the  system.  Usually  the  prior 
information  is  contained  in  a  background  estimate  of  the  state  xb  and  the  corresponding  back¬ 
ground  error  covariance  matrix  B. 

•  The  reality  is  described  by  a  computer  model  that  captures  our  knowledge  about  the  physical 
laws  that  govern  the  evolution  of  the  system: 

xk+i  =  Afk>k+i  (xk)  ,  k  =  0, 1,  •  •  •  ,N  —  1,  (1) 

where  A4k)k+i  represents  the  model  solution  operator  that  propagates  the  state  xfc  at  tk  to  the 
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state  xfc+i  at  4+i. 

•  The  sensor  network  provides  observations  of  some  aspects  of  the  real  state.  Observations  are 
noisy  snapshots  of  reality  available  at  discrete  time  instances  4,  k  =  1,  •  •  •  ,  N.  The  model 
state  is  related  to  observations  by  the  following  relation: 

yk  =  ri  (xk)  -  £k  ,  £k  =  £k  +  £k  .  (Z) 

The  observation  operator  TL  maps  the  model  state  space  onto  the  observation  space.  The  ob¬ 
servation  error  term  (ekbs)  accounts  for  both  measurement  and  representativeness  errors.  Mea¬ 
surement  errors  are  due  to  imperfect  sensors.  The  representativeness  errors  are  due  to  the 
inaccuracies  of  the  mathematical  and  numerical  approximations  inherent  to  the  model. 

The  inference  problem.  To  simplify  ideas  consider  (without  loss  of  generality)  that  the  model 
parameters  are  the  initial  conditions,  0  =  x(l.  The  inference  ( data  assimilation)  problem  is  formal¬ 
ized  as  a  model-constrained  optimization  problem: 

Xq  =  arg  min  J  (x0)  subject  to  ([!]).  (3a) 

X() 

1  1  N 

J  (xo)  =  2  llxo  -  xoHb-1(u)  +  2  Z)  llHk  (Xk’  u)  _  ^IIr^u)  •  (3b) 

z  k=l 

The  first  term  of  the  sum  pb])  quantifies  the  departure  of  the  solution  x0  from  the  background 
state  xb  at  the  initial  time  t0.  The  second  term  measures  the  mismatch  between  the  forecast  tra¬ 
jectory  (model  solutions  xk)  and  observations  yk  at  all  times  t'r'!  in  the  assimilation  window.  The 
weighting  matrices  B0  and  Rk  need  to  be  predefined,  and  their  quality  influences  the  accuracy  of 
the  resulting  analysis.  The  vector  u  represents  the  parameters  of  the  DDDAS  system,  e.g.,  sensor 
locations  and  weights  attributed  to  various  data  points. 

Weak  constraint  4D-Var  avoids  the  assumption  of  a  perfect  model  Ifl6ll.  implicit  in  the  tradi¬ 
tional  strong  constraint  formulation  ([3]),  at  the  expense  of  solving  a  larger  optimization  problem. 
The  state  X&  at  4  is  allowed  to  differ  from  the  model  prediction  A4-i,fc(x/c_i).  The  weak  con¬ 
straint  4D-Var  estimates  of  the  states  x  =  |x0, . . . ,  x+]  arc  the  unconstrained  minimizer  of  the 
following  cost  function: 

1  1  N  i  N 

min  Jw  (x)  =  -||x0-Xo||g_1  +  -^||^(xfe)-yfc||^_i  +  -^||xfe-3Wfe_iifc(xfe_i)||Q_1.  (4) 

k= 1  k= 1 

The  last  term  in  the  cost  function  (j4j)  corresponds  to  the  contribution  of  model  error  to  changing 
the  analysis.  The  model  is  not  imposed  exactly;  rather,  it  is  treated  as  a  weak  constraint  (i.e.,  the 
differences  x*.  —  Aik-i,k{*-k-i)  are  penalized  in  the  cost  function).  The  control  variables  in  (|4])  can 
be  not  only  the  model  states  x/,  at  each  time  step,  but  also  the  model  biases  /4  llT6ll. 

The  dynamic  configuration  of  the  observation  network  problem.  The  inverse  problem  is 
posed  as  a  PDE-constrained  nonlinear  optimization  problem,  where  model  states  and  parameters 
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are  tuned  in  order  to  obtain  predictions  that  fit  best  the  measurements.  The  complementary  prob¬ 
lem  is  to  optimize  the  strategies  used  to  collect  and  process  the  data,  such  as  to  improve  the  perfor¬ 
mance  of  the  inversion.  Formally,  dynamic  configuration  of  the  observation  network  is  achieved 
by  improving  the  parameters  u  of  the  system  ([3b]),  as  explained  in  0.  We  discuss  this  in  detail  in 
Section  Hi.  31 


II.  2  A-posteriori  error  estimates  for  the  solution  of  variational  inverse  prob¬ 
lems 

DDDAS  integrates  computational  simulations  and  physical  measurements  in  a  symbiotic  feed¬ 
back  control  system.  Inverse  problems  in  this  framework  use  data  from  measurements  along  with  a 
numerical  model  to  estimate  the  parameters  or  state  of  a  physical  system  of  interest.  Uncertainties 
in  both  measurements  and  the  computational  model  lead  to  inaccurate  estimates. 

Specifically,  in  practice  the  evolution  of  the  physical  system  is  described  by  an  imperfect  model 

xfc+i  =  Mk,k+\{%)  +  Axfc+i(xfc),  k  =  0, 1, . . .,  N  -  1 .,  (5) 

where  Axt+i(xt,  9)  represents  the  (additive)  model  error  at  time  tk+ 1-  The  observations  collected 
by  the  sensors  are  also  imperfect  and  contain  data  errors  Ay^.  In  practice  one  solves  a  perturbed 
inverse  problem  of  the  form: 

Xq  =  arg  minjr(x0)  subject  to  ([5]),  (6a) 

xo£]Rn 

1  1  N 

3 (xo)  =  2  llxo  -  xo IIb^^u)  +  2  u)  -  yk  -  AyfcllR-hu)  ■  (6b) 

z  k=i 

In  [fTOlimi  we  developed  a  goal-oriented  aposteriori  error  estimation  methodology  for  the  im¬ 
pact  of  different  errors  on  the  variational  solutions  of  inverse  problems.  Consider  a  quantity  of 
interest  (Qol)  defined  by  a  scalar  functional  S  :  Mm  — *  M  that  measures  a  certain  aspect  of  the  the 
optimal  solution  value 

Qol  =  £  (xq)  .  (7) 

In  the  goal-oriented  approach  we  are  interested  in  estimating  the  impact  of  observation  and  model 
errors  on  the  Qol,  i.e.,  on  the  aspect  of  interest  of  the  optimal  solution.  The  error  in  the  Qolis 

A£  =  £(xq)  —  £  (xq)  (8) 

where  xj]  and  Xq  are  the  solutions  of  the  ideal  inverse  problem  ([3])  and  of  the  perturbed  inverse 
problem  ([6]),  respectively. 

We  have  shown  in  [TOi  [TTj]  that  the  error  in  the  Qol  is  approximated  to  first  order  by  the  sum 
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of  contributions  of  errors  in  the  forward  model,  adjoint  model,  and  optimality  equation: 


A£  «  A£est  =  A£fwd  +  A£adj  +  A£opt.  (9) 

Moreover,  the  contributions  of  errors  are 

A£fwd  =  J2k=i  •  Axfc  ,  (10a) 

A^adj  =  -  E*L o  dl  ■  (HJRfc  lAyfc)  +  Ef= o1  '  (Axfc+i) Afc+i ,  (10b) 

A£opt  =  -CT  (Ax,)*  A,.  (10c) 


We  see  that  data  errors  contribute  to  the  error  in  adjoint  model.  The  forward  model  errors  contribute 
to  errors  in  both  the  forward  and  the  adjoint  equations.  The  “impact  factors”  (  G  Rm,  pk  G  M”  for 
k  =  0, . . . ,  N,  and  uk  G  for  k  —  0, . . . ,  N  are  calculated  by  the  following  algorithm: 

Linear  system:  (VXo>XQ  j)  ■  C  =  Vxo£  ; 

Tangent  linear  model:  fi0  =  -(  ;  fik+1  =  Mk}k+1  /jkl  k  —  Q,...,N~l; 

Second  order  adjoint:  vN  =  Tl  yR^1  H  v  \in  , 

"k  =  uk+ 1  +  (Mfc;fe+1  Afc+i)Xfc  /rfc 

+  ,  k  =  N  -  1, . . .  ,0. 

We  applied  This  methodology  to  real  scenarios.  Figure  |T|  illustrates  a  data  assimilation  calcu¬ 
lation  carried  out  using  the  Weather  Research  and  Forecast  Model  (WRF-VAR).  The  errors  in  the 
meridional  wind  component  observation  from  GEOAMV  and  their  impact  on  the  Qoiare  shown. 

II.  3  Dynamic  configuration  of  sensor  networks  via  optimization  of  DDDAS 
parameters 

The  variational  data  assimilation  problem  ([3])  optimizes  the  model  states  and  parameters  in 
order  to  obtain  predictions  that  fit  best  the  measurements. 

In  our  work  0  we  solve  the  complementary  problem  of  optimizing  the  DDDAS  process.  The 
strategies  used  to  collect  and  process  the  data  are  considered  to  be  parameters  of  the  inference 
system,  and  are  themselves  improved  via  an  additional  optimization  process.  Specifically,  we  seek 
to  improve  the  parameters  u  of  the  data  assimilation  system  (f3b|). 

We  measure  the  quality  of  the  inverse  solution  ([3])  by  the  discrepancy  between  the  model  fore¬ 
cast  (initialized  from  the  analysis  Xg)  and  a  set  of  high-quality  verification  data  y™nf  collected  at 
verification  time  tv.  This  discrepancy  can  be  measured  by  the  quadratic  “verification”  cost  function 

*(u)  =  ♦(«)  =  l  ||«„(x»)  -  y^'llc,  ■  (12) 

The  function  T  depends  directly  on  Xg(u),  and  indirectly  on  the  system  parameters  u. 
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(a)  Optimal  initial  V  (meridional  component  of  the  wind  (b)  Data  errors  in  GEOAMV  observations  (AyO- 
field)  at  ground  level  ( V  component  of  xg). 


Data  error  contributions  -  GEOAMV  -  V 


2011-04-27  18:00:00 


-2.674  -1.775  -.877  .022  .921  1.82  2.718 


(d)  Contribution  of  data  errors  to  error  in  analysis 
QOI(^HfcRfclAyfc)- 


Figure  1:  Our  aposteriori  error  estimator  lUOUTTll  applied  to  a  real  scenario:  data  assimilation  with  WRFDA. 
The  assimilation  window  is  from  18:00  @04/27/201 1  to  00: 00 @04/2 8/201 1.  The  simulation  domain  covers 
the  continental  U.S.  with  a  horizontal  grid  resolution  of  60Km.  We  consider  the  meridional  wind  component 
( V )  observations  taken  by  the  geostationary  satellite  GEOAMV  (geostationary  atmospheric  motion  vector). 
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In  [[4j  we  have  formulated  the  optimal  configuration  of  the  DDDAS  system  as  the  following 
“optimization-constrained  optimization  problem” 


uopt  =  arg  min  T  (xj!)  subject  to 

u  v  ' 


Xq  =  arg  minX0  J  (x0,  u)  , 


(13) 


Our  algorithm  to  solve  (f]~3j)  is  based  on  first  and  second  order  adjoints,  and  on  the  solution  of 
large  linear  systems.  In  [|5l[6l  we  proposed  efficient  methods  for  computing  observation  impact, 
including  low-rank  approximations  of  observation-to-analysis  sensitivities. 

Figure  [2] illustrates  the  application  of  our  methodology  to  detect  faulty  sensors.  They  are  not 
not  visible  from  the  inference  solution,  but  are  detectable  via  our  sensitivity  to  data  approach. 


(a)  The  smooth  4D-Var  increment 
(xg— Xq)  does  not  indicate  any  prob¬ 
lem  with  data. 


(b)  Supersensitivity  field  exhibits  a 
clear  structure. 


(c)  Sensitivity  to  observations 
field  clearly  identifies  the  loca¬ 
tions  of  the  two  faulty  sensors. 


Figure  2:  The  observation  impact  methodology  to  identify  possibly  faulty  sensors.  From  @. 


Figure  [3]  illustrates  how  the  optimization-constrained  optimization  process  can  dynamically 
adjust  the  data  covariances  (weights)  and  the  can  define  optimal  sensor  network  configurations. 
Both  procedures  lead  to  considerable  decrease  of  forecast  error  (fl2|),  and  therefore  to  sconsiderably 
improved  performance  of  the  DDDAS  system. 


II.  4  Parallel-in-time  algorithm  for  fast  variational  DDDAS  inference 


DDDAS  variational  inference  in  real  time  is  hindered  by  costly  forward  and  adjoint  model 
runs.  In  lfl2l  we  proposed  a  new  parallel-in-time  algorithm  to  speed  up  the  solution  process.  The 


concept  is  illustrated  in  Figure  4a 


The  original  4D-Var  problem  in  (f3a|)  is  solved  in  the  augmented  Lagrangian  framework  [J91 
Section  17.3].  To  expose  time  parallelism  the  assimilation  window  is  divided  into  N  sub-intervals, 
namely, 

[k ),tjv]  =  [t0,ti]  U  ...  U  (14) 


The  optimization  variables  are  the  forward  model  and  adjoint  model  states  at  the  interval  bound- 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


(a)  The  DDDAS  system  specifies  ini¬ 
tially  equal  noise  levels,  unaware  of 
larger  observation  errors  in  area. 


x 


( b )  Covariances  optimized  via  solving 
(p~3j)  automatically  reduce  the  weight 
given  to  noisy  data. 


Outer  iterations 


(c)  Decrease  of  forecast  error  © 
shows  a  considerably  improved  per¬ 
formance  of  the  DDDAS  system. 


(d)  Initial  sensor  locations. 
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x 

(e)  Optimized  sensor  locations  via 
solving  m 


(f)  Decrease  of  forecast  error  GD  = 
improved  DDDAS  performance. 


Figure  3:  Optimization  of  DDDAS  parameters  for  a  two  dimensional  shallow  water  system.  From  |@1. 
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aries  x  =  [x0,  •  •  ■  ,  xN]  and  A  =  [A0,  •  •  •  ,  Xn],  respectively.  Solution  continuity  equations  across 
interval  boundaries  are  added  as  constraints.  This  formulation  allows  to  integrate  the  forward  and 
the  adjoint  models  over  different  subintervals  in  parallel.  The  optimization  proceeds  in  cycles  of 
inner  and  outer  iterations  and  updates  alternatively  x  and  A  variables.  The  augmented  Lagrangian 
approach  leads  to  a  different  formulation  of  the  variational  data  assimilation  problem  than  weakly 
constrained  4D-Var. 

Results  from  applying  parallel-in-time  4D-var  data  assimilation  to  the  shallow  water  on  the 
sphere  model  is  illustrated  in  Figure  |4b}  A  speedup  factor  of  two  is  obtained  by  a  combination  of 
parallel  and  traditional  approaches.  Note  that  this  factor  of  two  is  obtained  on  top  of  the  parallel 
speedup  due  to  traditional  spatial  domain  decomposition  parallelization  applied  to  forward  and 
adjoint  models  lfl3ll. 


(a)  Concept:  the  parallel-in-time  solution  of  4D-Var  data  (b)  Data  assimilation  with  the  shallow  water  on  the  sphere 
assimilation  problem.  model:  2x  overall  speed-up. 


Figure  4:  Parallel-in-time  4D-Var  applied  to  2D  shallow  water  equations  fl2l. 


II.  5  Adjoint-free  variational  inference 

The  construction  and  validation  of  an  adjoint  model  is  an  extremely  labor-intensive  process.  To 
address  this  challenge  in  [[8]  we  proposed  a  derivative-free  4D-Var(-like)  smoothing  algorithm  that 
approximately  solves  DDDAS  variational  inference  without  the  need  to  construct  adjoint  models. 
Specifically,  our  TR-4D-EnKF  algorithm  uses  a  trust-region  approach  to  optimize  in  ensemble 
space. 

II.  6  Hamiltonian  Monte-Carlo  sampling  filter  and  smoother 

Variational  DDDAS  inference  solution  does  not  include  posterior  uncertainty  estimates.  To 
address  this  challenge  we  developed  new  nonlinear  filtering  0  and  smoothing  [QQ]  algorithms  that 
sample  directly  from  the  posterior  PDF  using  a  Hybrid  Markov-Chain  Monte  Carlo  (HMCMC) 
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approach.  The  sampling  smoother  is  implemented  efficiently  using  the  same  adjoint  computational 
infrastructure  used  in  4D-Var. 


II.  7  Optimization  with  reduced  order  model  surrogates 

Consistency  of  the  reduced-order  Karush-Kuhn-Tucker  conditions  with  the  full-order  optimal¬ 
ity  conditions  is  a  key  ingredient  for  successful  reduced  order  data  assimilation  problems.  This 
translates  into  accurate  low-rank  approximations  of  the  both  adjoint  and  forward  models  (see  031) 
leading  to  reduced  bases  constructed  from  the  dominant  eigenvectors  of  the  correlation  matrix  of 
the  aggregated  snapshots  of  full  forward  and  the  adjoint  models. 

Our  recent  work  [14,  [T5j  underlines  the  importance  of  incorporating  the  adjoint  information 
into  the  construction  of  reduced  order  basis  for  performing  reduced  order  4D-Var  data  assimilation 
(see  red  versus  blue  lines  in  Figure [5b]).  The  new  shallow  water  ROM  data  assimilation  system  pro¬ 
vides  analyses  similar  to  those  produced  by  the  full  resolution  data  assimilation  system  in  one  tenth 
of  the  computational  time  (see  black  versus  blue  lines  in  Figure  [5b]).  Another  order  of  magnitude 
in  savings  is  expected  with  three  dimensional  models. 


High-resolution  non-linear  trajectory 

-  i  - r. 


Reduced  basis  U 


- x ) 


xk=  x  +  Uxk 


Reduced-order  nonlinear  model 
§  <^|  Reduced-order  adjoint  model 


J 

V) 


Iterative  minimization  algorithm 


High-resolution  non-linear  forecast 


(a)  Concept:  the  proposed  solution  of  4D-Var  data  assim¬ 
ilation  problem  uses  ROMs  as  surrogates  in  an  inner  op¬ 
timization  loop. 


Number  of  iterations 


(b)  Cost  function  decrease  for  4D-Var  applied  to  2D  shal¬ 
low  water  equations  E).  It  is  essential  to  incorporate 
adjoint  information  in  reduced  basis. 


Figure  5:  Reduced  order  4D-Var  applied  to  2D  shallow  water  equations  Ifl5l.  Optimization  based  on  tra¬ 
ditional  reduced  order  models  -  constructed  from  the  forward  solution  snapshots  -  is  inaccurate  (red  line). 
However,  reduced  order  analysis  (blue  line)  is  as  accurate  as  the  full  model  one  (green  line)  when  adjoint 
information  is  incorporated  in  the  reduced  bases.  The  reduced  order  analysis  is  ten  times  faster  than  the  full 
order  one. 
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Abstract 

This  project  focuses  on  large  scale  dynamic  data  driven  applications  systems  (DDDAS,  or  InfoSymbiotic 
systems) 

governed  by  partial  differential  equations  (PDEs),  e.g.,  arising  in  atmospheric  environments.  Specifically, 
our  main  interests  are  data  assimilation  and  the  configuration  of  sensor  networks. 

Data  assimilation  is  the  process  to  dynamically  integrate  information  from  measurements  into  models.  In  a 

variational  approach  the  data  assimilation  problem  is  posed  as  an  inverse 

problem,  where  parameters  are  adjusted  such  that  the  model  predictions 

best  fit  the  measurements.  Sensor  network  configuration  is  the  process  of  using  model  results 

to  dynamically  steer  the  measurement  process. 


InfoSymbiotic  applications  are  inherently  subject  to  uncertainties  associated  with  imperfect  models  and 
with  noisy  data. 

There  is  an  urgent  need  to  quantify,  and  control,  the  effect  of  model  and  data  errors  on  the  overall  DDDAS 
DISTRIBUTION  A:  Distribution  approved  for  public  release. 


results,  and  to  fill  the  gap  between  the  state-of-the-art  modeling  techniques  (capable  of  quantifying 
uncertainty  in  modeling  results)  and  the  computational  tools  currently  available  for  InfoSymbiotic 
applications. 

During  the  first  three  years  of  this  project  (201 2-2015)  we  started  to  address  this  need  and  developed  a 
rigorous  framework  for  quantifying  and  reducing  uncertainty  in  the  context  of  InfoSymbiotic  systems. 

DDDAS  integrates  computational  simulations  and  physical  measurements  in  a  symbiotic  feedback  control 
system.  Inverse  problems  in  this  framework  use  data  from  measurements  along  with  a  numerical  model  to 
estimate  the  parameters  or  state  of  a  physical  system  of  interest.  Uncertainties  in  both  measurements  and 
the  computational  model  lead  to  inaccurate  estimates.  We  developed  a  goal-oriented  aposteriori  error 
estimation  methodology  for  the  impact  of  different  errors  on  the  variational  solutions  of  inverse  problems.  In 
the  goal-oriented  approach  we  are  interested  in  estimating  the  impact  of  observation  and  model  errors  on 
the  quantity  of  interest,  i.e.,  on  the  aspect  of  interest  of  the  optimal  solution. 

The  variational  data  assimilation  problem  optimizes  the  model  states  and  parameters  in  order  to  obtain 
predictions  that  fit  best  the  measurements.  In  this  project  we  solved  the  complementary  problem  of 
optimizing  the  DDDAS  process.  The  strategies  used  to  collect  and  process  the  data  are  considered  to  be 
parameters  of  the  inference  system,  and  are  themselves  improved  via  an  additional  optimization  process. 
Specifically,  we  seek  to  improve  the  parameters  of  the  data  assimilation  system.  We  have  formulated  the 
optimal  configuration  of  the  DDDAS  system  as  the  following  "optimization-constrained  optimization 
problem". 

Our  algorithm  is  based  on  first  and  second  order  adjoints,  and  on  the  solution  of  large  linear  systems.  We 
also  proposed  efficient  methods  for  computing  observation  impact,  including  low-rank  approximations  of 
observation-to-analysis  sensitivities. 

DDDAS  variational  inference  in  real  time  is  hindered  by  costly  forward  and  adjoint  model  runs.  We 
proposed  a  new  parallel-in-time  algorithm  to  speed  up  the  solution  process.  The  original  4D-Var  problem  is 
solved  in  the  augmented  Lagrangian  framework.  To  expose  time  parallelism  the  assimilation  window  is 
divided  into  several  sub-intervals.  This  formulation  allows  to  integrate  the  forward  and  the  adjoint  models 
over  different  subintervals  in  parallel. 

The  construction  and  validation  of  an  adjoint  model  is  an  extremely  labor-intensive  process. 

To  address  this  challenge  we  proposed  a  derivative-free  4D-Var(-like)  smoothing  algorithm  that 
approximately  solves  DDDAS  variational  inference  without  the  need  to  construct  adjoint  models. 
Specifically,  our  TR-4D-EnKF  algorithm  uses  a  trust-region  approach  to  optimize  in  ensemble  space. 

Variational  DDDAS  inference  solution  does  not  include  posterior  uncertainty  estimates.  To  address  this 
challenge  we  developed  new  nonlinear  filtering  and  smoothing  algorithms  that  sample  directly  from  the 
posterior  PDF  using  a  Hybrid  Markov-Chain  Monte  Carlo  (HMCMC)  approach.  The  sampling  smoother  is 
implemented  efficiently  using  the  same  adjoint  computational  infrastructure  used  in  4D-Var. 

Consistency  of  the  reduced-order  Karush-Kuhn-Tucker  conditions  with  the  full-order  optimality  conditions  is 
a  key  ingredient  for  successful  reduced  order  data  assimilation  problems.  This  translates  into  accurate  low- 
rank  approximations  of  the  both  adjoint  and  forward  models  leading  to  reduced  bases  constructed  from  the 
dominant  eigenvectors  of  the  correlation  matrix  of  the  aggregated  snapshots  of  full  forward  and  the  adjoint 
models.  Our  work  underlines  the  importance  of  incorporating  the  adjoint  information  into  the  construction  of 
reduced  order  basis  for  performing  reduced  order  4D-Var  data  assimilation. 
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